Goto

Collaborating Authors

 Tempe


My Tesla Was Driving Itself Perfectly--Until It Crashed

The Atlantic - Technology

This article was featured in the One Story to Read Today newsletter. T he smell was strange . The concrete wall was too close. One of my kids was standing on the sidewalk next to our car--not crying, just confused. The seat belt had held. The crumple zone had crumpled.


EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

Kuo, En-Ya, Motsch, Sebastien

arXiv.org Machine Learning

Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privacy protection and preserving feature correlations present in the original data.








Learning Generalized Policy Automata for Relational Stochastic Shortest Path Problems

Neural Information Processing Systems

Several goal-oriented problems in the real-world can be naturally expressed as Stochastic Shortest Path problems (SSPs). However, the computational complexity of solving SSPs makes nding solutions to even moderately sized problems intractable. State-of-the-art SSP solvers are unable to learn generalized solutions or policies that would solve multiple problem instances with different object names and/or quantities. This paper presents an approach for learning Generalized Policy Automata (GPAs): non-deterministic partial policies that can be used to catalyze the solution process. GPAs are learned using relational, feature-based abstractions, which makes them applicable on broad classes of related problems with different object names and quantities. Theoretical analysis of this approach shows that it guarantees completeness and hierarchical optimality. Empirical analysis shows that this approach effectively learns broadly applicable policy knowledge in a few-shot fashion and signicantly outperforms state-of-the-art SSP solvers on test problems whose object counts are far greater than those used during training.